2. Restate your questions. Has this changed at all since HW #1? If yes, how so?
Is there a gender gap in academic publishing and if so, what does it look like?
What is the gender gap across different academic disciplines?
How has the gender gap changed over time?
What does the gender gap look like from country to country?
3. Explain which variables from your data set you will use to answer your questions, and how.
The variables I will use to answer this question are as follows:
authors: int; represents the total number of authored publications for a specific country, gender, time period, and field.
country: chr; the country of origin of the publications
gender: chr; the gender of the publisher
subject_area_or_subfield: chr; the academic discipline the paper was published from
period: fct; represents one of two time periods (1993-2003) or (2014-2018). There is no information on publications between or after these time periods.
All this information comes from one dataset about gender and academic publishing created by Elsevier.
4. Find at least two data visualizations that you could borrow/adapt pieces from and explain which elements you might borrow.
Source: Georgios Karamanis
I like this data viz as a cool way I might show the gender gap amongst different academic disciplines. Currently, I’m using a bubble chart with the percentage of total publications by gender. I like this viz a lot because it really centers the industries themselves, rather than the numbers. I think this would work nicely with the data I have.
Source: Georgios Karamanis
If I wanted to switch up my variables and display gender gap by country with a dumbbell chart, it could look something like this. I like how the values along the x-axis are also located inside the bubbles themselves, that way you don’t have to work to hard to see what they are. I also like the gender color choices here.
5. Hand-draw your anticipated visualizations
Hand-drawn final visualization
6. Mock up all of your hand drawn visualizations using code
Load libraries
library(tidyverse)library(here)library(janitor)library(readxl)library(patchwork)library(showtext)library(glue)library(ggtext)library(scales)# Enable showtextshowtext_auto()# Ensure showtext is usedshowtext_opts(dpi =300)font_add_google(name ="Lexend", family ="lexend")
# Calculate summary stats grouped by fieldfields <- author_stats %>%group_by(subject_area_or_subfield, gender) %>%summarise(field_gender =sum(authors)) %>%mutate(total_authors =sum(field_gender),percent_field = field_gender/total_authors)# pivot wider to add columns of the number of authors by field by men or womenfields_wide <- fields %>%pivot_wider(id_cols =c(subject_area_or_subfield, total_authors),names_from = gender,values_from =c(field_gender, percent_field),names_prefix ="" ) %>%group_by(subject_area_or_subfield) %>%mutate(total_authors =first(total_authors),gender_gap = field_gender_Men - field_gender_Women,percent_gender_gap = percent_field_Men - percent_field_Women) %>%ungroup() %>%filter(subject_area_or_subfield !="ALL")
Dumbbell chart of total publications by gender across disciplines
Reveal code
# Reorder data by the gender gap from high to lowfields_wide <- fields_wide %>%mutate(subject_area_or_subfield =fct_reorder(.f = subject_area_or_subfield, .x = percent_gender_gap))# dumbbell plotggplot(fields_wide) +geom_linerange(aes(y = subject_area_or_subfield,xmin = percent_field_Women, xmax = percent_field_Men)) +geom_point(aes(x = percent_field_Women, y = subject_area_or_subfield, color ="Women"), size =2.5) +geom_point(aes(x = percent_field_Men, y = subject_area_or_subfield, color ="Men"),size =2.5) +geom_vline(xintercept = .5, linetype ="dashed", color ="gray40") +scale_x_continuous(breaks =seq(0, 1, by =0.1),labels = scales::percent_format(scale =100)) +scale_color_manual(values =c("Women"="#ec9bfc", "Men"="#6A1E99")) +labs(title ="Men Publish More Across Most Academic Fields",subtitle ="Percentage of total academic publications by men and women",x ="Percentage of Total Publications",y ="",color ="Gender") +theme_minimal(base_size =18) +# was 20 for pngtheme(# legend.position = "none",text =element_text(family ="lexend"),plot.title =element_text(face ="bold"),# plot.subtitle = ggtext::element_textbox(family = "sen",# size = rel(1.1),# color = "black",# width = unit(35, 'cm'),# padding = margin(t = 5, r = 0, b = 5, l = 0), margin = margin(t = 2, r = 0, b = 6, l = 0)),axis.title =element_text(size=rel(1)),axis.text.x =element_text(size=rel(1), face ="bold"),axis.text.y =element_text(size=rel(0.8), face ="bold", color ="black"),plot.background =element_rect(fill ="white", color =NA))
What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R? If you struggled with mocking up any of your three visualizations (from #6, above), describe those challenges here.
What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?
Currently, I’m using tidyverse, glue, showtext, ggtext, and scales. The only one we haven’t explicitly covered in this class is patchwork.
What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?
I think feedback around the overarching question would be nice. It still feels a little weak to me, but also I think they’re are a lot of constraints over what I can ask given the quality of the data (and the amount of time I can put into this project).